A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics

. 2021 Dec ; 30 (23) : 6021-6035. [epub] 20211031

Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid34674330

Grantová podpora
2017-04980 Swedish Research Council
2019-04739 Swedish Research Council
Swedish Foundation for Strategic Research
Royal Botanic Gardens, Kew
GJ20-18566Y Grant Agency of the Czech Republic
MARIPOSAS-704035 Marie Skłodowska-Curie Fellowship of the European Commission

The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.

Zobrazit více v PubMed

Albert, T. J. , Molla, M. N. , Muzny, D. M. , Nazareth, L. , Wheeler, D. , Song, X. , Richmond, T. A. , Middle, C. M. , Rodesch, M. J. , Packard, C. J. , Weinstock, G. M. , & Gibbs, R. A. (2007). Direct selection of human genomic loci by microarray hybridization. Nature Methods, 4(11), 903–905. 10.1038/nmeth1111 PubMed DOI

Allen, J. M. , Boyd, B. , Nguyen, N.‐P. , Vachaspati, P. , Warnow, T. , Huang, D. I. , Grady, P. G. S. , Bell, K. C. , Cronk, Q. C. B. , Mugisha, L. , Pittendrigh, B. R. , Soledad Leonardi, M. , Reed, D. L. , & Johnson, K. P. (2017). Phylogenomics from whole genome sequences using aTRAM. Systematic Biology, 66(5), 786–798. 10.1093/sysbio/syw105 PubMed DOI

Allio, R. , Schomaker‐Bastos, A. , Romiguier, J. , Prosdocimi, F. , Nabholz, B. , & Delsuc, F. (2020). MitoFinder: Efficient automated large‐scale extraction of mitogenomic data in target enrichment phylogenomics. Molecular Ecology Resources, 20(4), 892–905. 10.1111/1755-0998.13160 PubMed DOI PMC

Altschul, S. F. , Gish, W. , Miller, W. , Myers, E. W. , & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410. 10.1016/S0022-2836(05)80360-2 PubMed DOI

Andermann, T. , Cano, Á. , Zizka, A. , Bacon, C. , & Antonelli, A. (2018). SECAPR—A bioinformatics pipeline for the rapid and user‐friendly processing of targeted enriched Illumina sequences, from raw reads to alignments. PeerJ, 2018(7), 1–15. 10.7717/peerj.5175 PubMed DOI PMC

Andermann, T. , Fernandes, A. M. , Olsson, U. , Töpel, M. , Pfeil, B. , Oxelman, B. , Aleixo, A. , Faircloth, B. C. , & Antonelli, A. (2019). Allele phasing greatly improves the phylogenetic utility of ultraconserved elements. Systematic Biology, 68(1), 32–46. 10.1093/sysbio/syy039 PubMed DOI PMC

Andermann, T. , Torres Jiménez, M. F. , Matos‐Maraví, P. , Batista, R. , Blanco‐Pastor, J. L. , Gustafsson, A. L. S. , Kistler, L. , Liberal, I. M. , Oxelman, B. , Bacon, C. D. , & Antonelli, A. (2020). A guide to carrying out a phylogenomic target sequence capture project. Frontiers in Genetics, 10(1407), 1–20. 10.3389/fgene.2019.01407 PubMed DOI PMC

Andrews, S. (2010). FastQC. Retrieved from https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Bankevich, A. , Nurk, S. , Antipov, D. , Gurevich, A. A. , Dvorkin, M. , Kulikov, A. S. , Lesin, V. M. , Nikolenko, S. I. , Pham, S. , Prjibelski, A. D. , Pyshkin, A. V. , Sirotkin, A. V. , Vyahhi, N. , Tesler, G. , Alekseyev, M. A. , & Pevzner, P. A. (2012). SPAdes: A new genome assembly algorithm and its applications to single‐cell sequencing. Journal of Computational Biology, 19(5), 455–477. 10.1089/cmb.2012.0021 PubMed DOI PMC

Bates, D. , Mächler, M. , Bolker, B. M. , & Walker, S. C. (2015). Fitting linear mixed‐effects models using lme4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01 DOI

Bizon, C. , Spiegel, M. , Chasse, S. A. , Gizer, I. R. , Li, Y. , Malc, E. P. , Mieczkowski, P. A. , Sailsbery, J. K. , Wang, X. , Ehlers, C. L. , & Wilhelmsen, K. C. (2014). Variant calling in low‐coverage whole genome sequencing of a Native American population sample. BMC Genomics, 15(85), 1–10. 10.1186/1471-2164-15-85 PubMed DOI PMC

Bloom, B.H. (1970). Space/time trade‐offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422–426. 10.1145/362686.362692 DOI

Bolger, A. M. , Lohse, M. , & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. 10.1093/bioinformatics/btu170 PubMed DOI PMC

Brankovics, B. , Zhang, H. , van Diepeningen, A. D. , van der Lee, T. A. J. , Waalwijk, C. , & de Hoog, G. S. (2016). GRAbB: Selective assembly of genomic regions, a new niche for genomic research. PLoS Computational Biology, 12(6), 1–9. 10.1371/journal.pcbi.1004753 PubMed DOI PMC

Burbano, H. A. , Hodges, E. , Green, R. E. , Briggs, A. W. , Krause, J. , Meyer, M. , Good, J. M. , Maricic, T. , Johnson, P. L. F. , Xuan, Z. , Rooks, M. , Bhattacharjee, A. , Brizuela, L. , Albert, F. W. , de la Rasilla, M. , Fortea, J. , Rosas, A. , Lachmann, M. , Hannon, G. J. , & Paabo, S. (2010). Targeted investigation of the neandertal genome by array‐based sequence capture. Science, 723, 723–726. 10.1126/science.1188046 PubMed DOI PMC

Bushnell, B. (2021). BBTools/BBMap.

Chikhi, R. , & Rizk, G. (2013). Space‐efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology, 8(1), 1–9. 10.1186/1748-7188-8-22 PubMed DOI PMC

Costa, I. R. , Prosdocimi, F. , & Jennings, W. B. (2016). In silico phylogenomics using complete genomes: A case study on the evolution of hominoids. Genome Research, 26(9), 1257–1267. 10.1101/gr.203950.115 PubMed DOI PMC

Davey, J. L. , & Blaxter, M. W. (2010). RADseq: Next‐generation population genetics. Briefings in Functional Genomics, 9(5–6), 416–423. 10.1093/bfgp/elq031 PubMed DOI PMC

Eren, A. M. , Kiefl, E. , Shaiber, A. , Veseli, I. , Miller, S. E. , Schechter, M. S. , Fink, I. , Pan, J. N. , Yousef, M. , Fogarty, E. C. , Trigodet, F. , Watson, A. R. , Esen, Ö. C. , Moore, R. M. , Clayssen, Q. , Lee, M. D. , Kivenson, V. , Graham, E. D. , Merrill, B. D. , … Willis, A. D. (2021). Community‐led, integrated, reproducible multi‐omics with anvi’o. Nature Microbiology, 6(1), 3–6. 10.1038/s41564-020-00834-3 PubMed DOI PMC

Espeland, M. , Breinholt, J. , Willmott, K. R. , Warren, A. D. , Vila, R. , Toussaint, E. F. A. , Maunsell, S. C. , Aduse‐Poku, K. , Talavera, G. , Eastwood, R. , Jarzyna, M. A. , Guralnick, R. , Lohman, D. J. , Pierce, N. E. , & Kawahara, A. Y. (2018). A comprehensive and dated phylogenomic analysis of butterflies. Current Biology, 28(5), 770–778.e5. 10.1016/j.cub.2018.01.061 PubMed DOI

Faircloth, B. C. (2016). PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics, 32(5), 786–788. 10.1093/bioinformatics/btv646 PubMed DOI

Faircloth, B. C. , McCormack, J. E. , Crawford, N. G. , Harvey, M. G. , Brumfield, R. T. , & Glenn, T. C. (2012). Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Systematic Biology, 61(5), 717–726. 10.1093/sysbio/sys004 PubMed DOI

Gnirke, A. , Melnikov, A. , Maguire, J. , Rogov, P. , LeProust, E. M. , Brockman, W. , Fennell, T. , Giannoukos, G. , Fisher, S. , Russ, C. , Gabriel, S. , Jaffe, D. B. , Lander, E. S. , & Nusbaum, C. (2009). Solution hybrid selection with ultra‐long oligonucleotides for massively parallel targeted sequencing. Nature Biotechnology, 27(2), 182–189. 10.1038/nbt.1523 PubMed DOI PMC

Hernandez, D. , François, P. , Farinelli, L. , Østerås, M. , & Schrenzel, J. (2008). De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. Genome Research, 18(5), 802–809. 10.1101/gr.072033.107 PubMed DOI PMC

Hernandez, D. , Tewhey, R. , Veyrieras, J. B. , Farinelli, L. , Østerås, M. , François, P. , & Schrenzel, J. (2014). De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired‐end reads. Bioinformatics, 30(1), 40–49. 10.1093/bioinformatics/btt590 PubMed DOI PMC

Hoang, D. T. , Chernomor, O. , von Haeseler, A. , Minh, B. Q. , & Vinh, L. S. (2017). UFBoot2: Improving the ultrafast bootstrap approximation. BioRxiv, 35(2), 518–522. 10.1101/153916 PubMed DOI PMC

Jackman, S. D. , Vandervalk, B. P. , Mohamadi, H. , Chu, J. , Yeo, S. , Hammond, S. A. , & Birol, I. (2017). ABySS 2.0: Resource‐efficient assembly of large genomes using a bloom filter effect of bloom filter false positive rate. Genome Research, 27, 768–777. 10.1101/gr.214346.116.Freely PubMed DOI PMC

Jarvis, E. D. , Mirarab, S. , Aberer, A. J. , Li, B. O. , Houde, P. , Li, C. , Ho, S. Y. W. , Faircloth, B. C. , Nabholz, B. , Howard, J. T. , Suh, A. , Weber, C. C. , da Fonseca, R. R. , Li, J. , Zhang, F. , Li, H. , Zhou, L. , Narula, N. , Liu, L. , … Zhang, G. (2014). Whole‐genome analyses resolve early branches in the tree of life of modern birds. Science, 346(6215), 1320–1331. 10.1126/science.1253451 PubMed DOI PMC

Kalyaanamoorthy, S. , Minh, B. Q. , Wong, T. K. F. , Von Haeseler, A. , & Jermiin, L. S. (2017). ModelFinder: Fast model selection for accurate phylogenetic estimates. Nature Methods, 14(6), 587–589. 10.1038/nmeth.4285 PubMed DOI PMC

Katoh, K. , & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution, 30(4), 772–780. 10.1093/molbev/mst010 PubMed DOI PMC

Lemmon, E. M. , & Lemmon, A. R. (2013). High‐throughput genomic data in systematics and phylogenetics. Annual Review of Ecology, Evolution, and Systematics, 44, 99–121. 10.1146/annurev-ecolsys-110512-135822 DOI

Li, W. , Cong, Q. , Shen, J. , Zhang, J. , Hallwachs, W. , Janzen, D. H. , & Grishin, N. V. (2019). Genomes of skipper butterflies reveal extensive convergence of wing patterns. Proceedings of the National Academy of Sciences of the United States of America, 116(13), 6232–6237. 10.1073/pnas.1821304116 PubMed DOI PMC

Liao, X. , Li, M. , Zou, Y. , Wu, F‐X. , Yi, P. , & Wang, J. (2019). Current challenges and solutions of de novo assembly. Quantitative Biology, 7(2), 90–109. 10.1007/s40484-019-0166-9 DOI

Lou, R. N. , Jacobs, A. , Wilder, A. P. , & Therkildsen, N. O. (2021). A beginner’s guide to low‐coverage whole genome sequencing for population genomics. Molecular Ecology, 1–68. 10.1111/mec.16077 PubMed DOI

Menelaou, A. , & Marchini, J. (2013). Genotype calling and phasing using next‐generation sequencing reads and a haplotype scaffold. Bioinformatics, 29(1), 84–91. 10.1093/bioinformatics/bts632 PubMed DOI

Minh, B. Q. , Schmidt, H. A. , Chernomor, O. , Schrempf, D. , Woodhams, M. D. , Von Haeseler, A. , & Teeling, E. (2020). IQ‐TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution, 37(5), 1530–1534. 10.1093/molbev/msaa015 PubMed DOI PMC

Nurk, S. , Meleshko, D. , Korobeynikov, A. , & Pevzner, P. A. (2017). MetaSPAdes: A new versatile metagenomic assembler. Genome Research, 27(5), 824–834. 10.1101/gr.213959.116 PubMed DOI PMC

Olofsson, J. K. , Cantera, I. , Van de Paer, C. , Hong‐Wa, C. , Zedane, L. , Dunning, L. T. , Alberti, A. , Christin, P.‐A. , & Besnard, G. (2019). Phylogenomics using low‐depth whole genome sequencing: A case study with the olive tribe. Molecular Ecology Resources, 19(4), 877–892. 10.1111/1755-0998.13016 PubMed DOI

R Core Team . (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from https://www.r‐project.org/

Robinson, D. F. , & Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical Biosciences, 53(1–2), 131–147. 10.1016/0025-5564(81)90043-2 DOI

Rubinacci, S. , Ribeiro, D. M. , Hofmeister, R. J. , & Delaneau, O. (2021). Efficient phasing and imputation of low‐coverage sequencing data using large reference panels. Nature Genetics, 53(1), 120–126. 10.1038/s41588-020-00756-0 PubMed DOI

Rustagi, N. , Zhou, A. , Watkins, W. S. , Gedvilaite, E. , Wang, S. , Ramesh, N. , Muzny, D. , Gibbs, R. A. , Jorde, L. B. , Yu, F. , & Xing, J. (2017). Extremely low‐coverage whole genome sequencing in South Asians captures population genomics information. BMC Genomics, 18(1), 1–12. 10.1186/s12864-017-3767-6 PubMed DOI PMC

Sayyari, E. , & Mirarab, S. (2016). Fast coalescent‐based computation of local branch support from quartet frequencies. Molecular Biology and Evolution, 33(7), 1654–1668. 10.1093/molbev/msw079 PubMed DOI PMC

Schliep, K. P. (2011). phangorn: Phylogenetic analysis in R. Bioinformatics, 27(4), 592–593. 10.1093/bioinformatics/btq706 PubMed DOI PMC

Schwartz, S. , Kent, W. J. , Smit, A. , Zhang, Z. , Baertsch, R. , Hardison, R. C. , & Miller, W. (2003). Human‐mouse alignments with BLASTZ. Genome Research, 13(1), 103–107. 10.1101/gr.809403 PubMed DOI PMC

Shen, J. , Cong, Q. , Borek, D. , Otwinowski, Z. , & Grishin, N. V. (2017). Complete genome of achalarus lyciades, the first representative of the eudaminae subfamily of skippers. Current Genomics, 18(4), 366–374. 10.2174/1389202918666170426113315 PubMed DOI PMC

Simion, P. , Delsuc, F. , Philippe, H. , Simion, P. , Delsuc, F. , Philippe, H. , Philippe, H. (2020). To what extent current limits of phylogenomics can be overcome? No Commercial Publisher | Authors Open Access Book. Retrieved from https://hal.archives‐ouvertes.fr/hal‐02535366/document

Simpson, J. T. , Wong, K. , Jackman, S. D. , Schein, J. E. , Jones, S. J. M. , & Birol, I. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123. 10.1101/gr.089532.108 PubMed DOI PMC

Sohn, J‐I. , & Nam, J.W. (2016). The present and future of de novo whole‐genome assembly. Briefings in Bioinformatics, 19(1), 23–40. 10.1093/bib/bbw096 PubMed DOI

Zan, Y. , Payen, T. , Lillie, M. , Honaker, C. F. , Siegel, P. B. , & Carlborg, Ö. (2019). Genotyping by low‐coverage whole‐genome sequencing in intercross pedigrees from outbred founders: A cost‐efficient approach. Genetics Selection Evolution, 51(1), 1–11. 10.1186/s12711-019-0487-1 PubMed DOI PMC

Zhang, C. , Rabiee, M. , Sayyari, E. , & Mirarab, S. (2018). ASTRAL‐III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19(Suppl. 6), 15–30. 10.1186/s12859-018-2129-y PubMed DOI PMC

Zhang, F. , Ding, Y. , Zhu, C. D. , Zhou, X. , Orr, M. C. , Scheu, S. , & Luan, Y. X. (2019). Phylogenomics from low‐coverage whole‐genome sequencing. Methods in Ecology and Evolution, 10(4), 507–517. 10.1111/2041-210X.13145 DOI

Zhang, P. , Boisson, B. , Stenson, P. D. , Cooper, D. N. , Casanova, J. L. , Abel, L. , & Itan, Y. (2019). SeqTailor: A user‐friendly webserver for the extraction of DNA or protein sequences from next‐generation sequencing data. Nucleic Acids Research, 47(W1), W623–W631. 10.1093/nar/gkz326 PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace