Physcraper: a Python package for continually updated phylogenetic trees using the Open Tree of Life

. 2021 Jun 29 ; 22 (1) : 355. [epub] 20210629

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid34187366

Grantová podpora
1759838 National Science Foundation
1759846 National Science Foundation

Odkazy

PubMed 34187366
PubMed Central PMC8244228
DOI 10.1186/s12859-021-04274-6
PII: 10.1186/s12859-021-04274-6
Knihovny.cz E-zdroje

BACKGROUND: Phylogenies are a key part of research in many areas of biology. Tools that automate some parts of the process of phylogenetic reconstruction, mainly molecular character matrix assembly, have been developed for the advantage of both specialists in the field of phylogenetics and non-specialists. However, interpretation of results, comparison with previously available phylogenetic hypotheses, and selection of one phylogeny for downstream analyses and discussion still impose difficulties to one that is not a specialist either on phylogenetic methods or on a particular group of study. RESULTS: Physcraper is a command-line Python program that automates the update of published phylogenies by adding public DNA sequences to underlying alignments of previously published phylogenies. It also provides a framework for straightforward comparison of published phylogenies with their updated versions, by leveraging upon tools from the Open Tree of Life project to link taxonomic information across databases. The program can be used by the nonspecialist, as a tool to generate phylogenetic hypotheses based on publicly available expert phylogenetic knowledge. Phylogeneticists and taxonomic group specialists will find it useful as a tool to facilitate molecular dataset gathering and comparison of alternative phylogenetic hypotheses (topologies). CONCLUSION: The Physcraper workflow showcases the benefits of doing open science for phylogenetics, encouraging researchers to strive for better scientific sharing practices. Physcraper can be used with any OS and is released under an open-source license. Detailed instructions for installation and usage are available at https://physcraper.readthedocs.io.

Zobrazit více v PubMed

Dobzhansky T. Nothing in biology makes sense except in the light of evolution. Am Biol Teach. 1973;35(3):125–129. doi: 10.2307/4444260. DOI

Hillis DM. Inferring complex phylogenies. Nature. 1996;383(6596):130–131. doi: 10.1038/383130a0. PubMed DOI

Natsidis P, Tsakogiannis A, Pavlidis P, Tsigenopoulos CS, Manousaki T. Phylogenomics investigation of sparids (Teleostei: Spariformes) using high-quality proteomes highlights the importance of taxon sampling. Commun Biol. 2019;2(1):1–10. doi: 10.1038/s42003-019-0654-5. PubMed DOI PMC

Schulte JA. Undersampling taxa will underestimate molecular divergence dates: an example from the South American lizard clade Liolaemini. Int J Evol Biol. 2013. PubMed PMC

Soares AE, Schrago CG. The influence of taxon sampling on Bayesian divergence time inference under scenarios of rate heterogeneity among lineages. J Theor Biol. 2015;364:31–39. doi: 10.1016/j.jtbi.2014.09.004. PubMed DOI

Kayaalp P, Stevens MI, Schwarz MP. Back to Africa: increased taxon sampling confirms a problematic Australia-to-Africa bee dispersal event in the Eocene. Syst Entomol. 2017;42(4):724–733. doi: 10.1111/syen.12241. DOI

Hedtke SM, Townsend TM, Hillis DM. Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst Biol. 2006;55(3):522–529. doi: 10.1080/10635150600697358. PubMed DOI

Townsend JP, Lopez-Giraldez F. Optimal selection of gene and ingroup taxon sampling for resolving phylogenetic relationships. Syst Biol. 2010;59(4):446–457. doi: 10.1093/sysbio/syq025. PubMed DOI

Rees JA, Cranston K. Automated assembly of a reference taxonomy for phylogenetic data synthesis. Biodiversi Data J. 2017 doi: 10.3897/BDJ.5.e12581. PubMed DOI PMC

Baxevanis AD, Bateman A. The importance of biological databases in biological discovery. Curr Protoc Bioinform. 2015;50(1):1. doi: 10.1002/0471250953.bi0101s50. PubMed DOI

Federhen S. The NCBI taxonomy database. Nucl Acids Res. 2012;40(D1):136–43. doi: 10.1093/nar/gkr1178. PubMed DOI PMC

Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O’Neill K, Robbertse B, Sharma S, Soussov V, Sullivan JP, Sun L, Turner S, Karsch-Mizrachi I. NCBI taxonomy: a comprehensive update on curation, resources and tools. Database. 2020. PubMed PMC

GBIF Secretariat: GBIF Backbone Taxonomy. Checklist dataset. 10.15468/39omei. Accessed via GBIF.org on April 2021. https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c.

OpenTreeOfLife, Redelings B, Cranston KA, Allman J, Holder MT, McTavish EJ. Open tree of life APIs V. 3.0. https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs.

Sanderson MJ, Boss D, Chen D, Cranston KA, Wehe A. The PhyLoTA browser: processing genbank for molecular phylogenetics research. Syst Biol. 2008;57(3):335–46. doi: 10.1080/10635150802158688. PubMed DOI

McTavish EJ, Drew BT, Redelings B, Cranston KA. How and why to build a unified tree of life. BioEssays. 2017 doi: 10.1002/bies.201700114. PubMed DOI

McTavish EJ, Hinchliff CE, Allman JF, Brown JW, Cranston KA, Holder MT, Rees JA, Smith SA. Phylesystem: a git-based data store for community-curated phylogenetic estimates. Bioinformatics. 2015;31(17):2794–2800. doi: 10.1093/bioinformatics/btv276. PubMed DOI PMC

Smith SA, Beaulieu JM, Donoghue MJ. Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches. BMC Evol Biol. 2009;9(1):37. doi: 10.1186/1471-2148-9-37. PubMed DOI PMC

Antonelli A, Hettling H, Condamine FL, Vos K, Nilsson RH, Sanderson MJ, Sauquet H, Scharn R, Silvestro D, Töpel M, et al. Toward a self-updating platform for estimating rates of speciation and migration, ages, and relationships of taxa. Syst Biol. 2017;66(2):152–166. doi: 10.1093/sysbio/syw066. PubMed DOI PMC

Izquierdo-Carrasco F, Cazes J, Smith SA, Stamatakis A. Pumper: phylogenies updated perpetually. Bioinformatics. 2014;30(10):1476–1477. doi: 10.1093/bioinformatics/btu053. PubMed DOI PMC

Pearse WD, Purvis A. phylogenerator: an automated phylogeny generation tool for ecologists. Methods Ecol Evol. 2013;4(7):692–698. doi: 10.1111/2041-210X.12055. DOI

Jones MR, Good JM. Targeted capture in evolutionary and ecological genomics. Mol Ecol. 2016;25(1):185–202. doi: 10.1111/mec.13304. PubMed DOI PMC

Andermann T, Torres Jiménez MF, Matos-Martínez P, Batista R, Blanco-Pastor JL, Gustafsson ALS, Kistler L, Liberal IM, Oxelman B, Bacon CD, Antonelli A. A guide to carrying out a phylogenomic target sequence capture project. Front Genetics. 2020;10(1407):1–20. doi: 10.3389/fgene.2019.01407. PubMed DOI PMC

Fragoso-Martínez I, Salazar GA, Martínez-Gordillo M, Magallón S, Sánchez-Reyes L, Lemmon EM, Lemmon AR, Sazatornil F, Mendoza CG. A pilot study applying the plant Anchored Hybrid Enrichment method to New World sages (Salvia subgenus Calosphace, Lamiaceae) Mol Phylogenetics Evol. 2017;117:124–134. doi: 10.1016/j.ympev.2017.02.006. PubMed DOI

Piel W, Chan L, Dominus M, Ruan J, Vos R, Tannen V. Treebase v. 2: a database of phylogenetic knowledge. e-Biosphere. London. 2009.

Vos RA, Balhoff JP, Caravas JA, Holder MT, Lapp H, Maddison WP, Midford PE, Priyam A, Sukumaran J, Xia X, et al. NeXML: rich, extensible, and verifiable representation of comparative data and metadata. Syst Biol. 2012;61(4):675–689. doi: 10.1093/sysbio/sys025. PubMed DOI PMC

Piel WH, Vos RA. Treebasedmp: a toolkit for phyloinformatic research. bioRxiv, 399030. 2018.

Morrison DA. Multiple sequence alignment for phylogenetic purposes. Aust Syst Bot. 2006;19(6):479–539. doi: 10.1071/SB06020. DOI

Thénault, Sylvain (Logilab S.A.): Pylint. Accessed March 2021. https://www.pylint.org/.

Thénault, Sylvain (Logilab S.A.), PyCQA, and contributors: Pylint User Manual. Accessed March 2021. http://pylint.pycqa.org/en/latest/.

OpenTreeOfLife, Redelings B, Cranston KA, Allman J, Holder MT, McTavish EJ. Open tree of life taxonomy V. 3.2. https://tree.opentreeoflife.org/about/taxonomy-version/ott3.2.

OpenTreeOfLife: Name Resolution (TNRS) bulk mapping tool. https://tree.opentreeoflife.org/curator/tnrs/.

OpenTreeOfLife, McTavish EJ, Hinchliff CE, Allman JF, Brown JW, Cranston KA, Holder MT, Rees JA, Smith SA. Phylesystem’s top-level repository in the Open Tree of Life phylogenetic study document store. https://github.com/opentreeoflife/phylesystem

Piel W, Chan L, Dominus M, Ruan,J. Vos R, Tannen V. TreeBASE: a database of phylogenetic knowledge. https://treebase.org/treebase-web/home.html.

Vos, R.: SuperTreeBASE: data dump and code to summarize TreeBASE. https://github.com/TreeBASE/supertreebase.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. PubMed DOI

OpenTreeOfLife, Rees JA, Cranston K. OpenTree’s taxonomic MRCA API. https://github.com/OpenTreeOfLife/germinator/wiki/Taxonomy-API-v3#mrca.

Camacho C, George C, Vahram A, Ning M, Jason P, Kevin B, Thomas L. BLAST+: architecture and applications. BMC Bioinform. 2009;10(1):421. doi: 10.1186/1471-2105-10-421. PubMed DOI PMC

Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–1423. doi: 10.1093/bioinformatics/btp163. PubMed DOI PMC

The BioPython Contributors (1999–2018): BioPython 1.71, Module Bio.Blast.NCBIWWW. Accessed April 19, 2018. https://biopython.org/DIST/docs/api/Bio.Blast.NCBIWWW-module.html.

Edgar RC. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004;32(5):1792–1797. doi: 10.1093/nar/gkh340. PubMed DOI PMC

Stamatakis A. Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. doi: 10.1093/bioinformatics/btu033. PubMed DOI PMC

Sukumaran J, Holder MT. DendroPy: a Python library for phylogenetic computing. Bioinformatics. 2010;26(12):1569–1571. doi: 10.1093/bioinformatics/btq228. PubMed DOI

Redelings BD, Holder MT. A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species. PeerJ. 2017;5:3058. doi: 10.7717/peerj.3058. PubMed DOI PMC

Gottlieb AM, Giberti GC, Poggio L. Molecular analyses of the genus ilex (aquifoliaceae) in southern south america, evidence from aflp and its sequence data. Am Jo Bot. 2005;92(2):352–369. doi: 10.3732/ajb.92.2.352. PubMed DOI

The Plant List 2013. Version 1.1: list of name records for the generic epithet Ilex. http://www.theplantlist.org/tpl1.1/search?q=ilex.

Chase MW, Christenhusz M, Fay M, Byng J, Judd WS, Soltis D, Mabberley D, Sennikov A, Soltis PS, Stevens PF. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc. 2016;181(1):1–20. doi: 10.1111/boj.12385. DOI

Cuénoud P, Martinez M.A.d.P, Loizeay P.-A, Spichiger R, Andrews S, Manen J.-F. Molecular phylogeny and biogeography of the genus Ilex L.(Aquifoliaceae) Ann Bot. 2000;85(1):111–122. doi: 10.1006/anbo.1999.1003. DOI

Manen J-F, Barriera G, Loizeau P-A, Naciri Y. The history of extant Ilex species (Aquifoliaceae): evidence of hybridization within a Miocene radiation. Mol Phylogenetics Evol. 2010;57(3):961–977. doi: 10.1016/j.ympev.2010.09.006. PubMed DOI

Setoguchi H, Watanabe I. Intersectional gene flow between insular endemics of Ilex (Aquifoliaceae) on the Bonin Islands and the Ryukyu Islands. Am J Bot. 2000;87(6):793–810. doi: 10.2307/2656887. PubMed DOI

Selbach-Schnadelbach A, Cavalli SS, Manen J-F, Coelho GC, De Souza-Chies TT. New information for Ilex phylogenetics based on the plastid psbA-trnH intergenic spacer (Aquifoliaceae) Bot J Linn Soc. 2009;159(1):182–193. doi: 10.1111/j.1095-8339.2008.00898.x. DOI

Yao X, Song Y, Yang J-B, Tan Y-H, Corlett RT. Phylogeny and biogeography of the hollies (Ilex L., Aquifoliaceae) J Syst Evol. 2020;58(5):1–10. doi: 10.1111/jse.12567. DOI

Gottlieb AM, Giberti GC, Poggio L. TreeBASE study 1091. https://treebase.org/treebase-web/search/study/summary.html?id=1091.

Gottlieb AM, Giberti GC, Poggio L. Phylesystem study pg\_2827. https://tree.opentreeoflife.org/curator/study/edit/pg_2827/?tab=home.

OpenTreeOfLife, Redelings B, Reyes LLS, Cranston KA, Allman J, Holder MT, McTavish EJ. Open Tree of Life Synthetic subtree, node id mrcaott68451ott89474. https://tree.opentreeoflife.org/opentree/opentree12.3@mrcaott68451ott89474/Ilex-theizans--Ilex-dumosa.

Yao X, Song Y, Yang J-B, Tan Y-H, Corlett RT. Phylesystem study ot\_1984. https://tree.opentreeoflife.org/curator/study/view/ot_1984.

Yao X, Song Y, Yang J-B, Tan Y-H, Corlett RT. Phylogeny and biogeography of the hollies (Ilex L., Aquifoliaceae), Dryad, Dataset. https://datadryad.org/stash/dataset/10.5061/dryad.k0p2ngf4x.Accessed: April 2020.

Berger SA, Krompass D, Stamatakis A. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol. 2011 doi: 10.1093/sysbio/syr010. PubMed DOI PMC

Matsen F, Kodner R, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 2010;11(1):538. doi: 10.1186/1471-2105-11-538. PubMed DOI PMC

Smith SA, Walker JF. Pyphlawd: a python tool for phylogenetic dataset construction. Methods Ecol Evol. 2019;10(1):104–108. doi: 10.1111/2041-210X.13096. DOI

Bennett DJ, Hettling H, Silvestro D, Zizka A, Bacon CD, Faurby S, Vos RA, Antonelli A. phylotar: an automated pipeline for retrieving orthologous dna sequences from genbank in r. Life. 2018;8(2):20. doi: 10.3390/life8020020. PubMed DOI PMC

Huang H, Knowles LL. What is the danger of the anomaly zone for empirical phylogenetics? Syst Biol. 2009 doi: 10.1093/sysbio/syp047. PubMed DOI

Song S, Liu L, Edwards SV, Wu S. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc Natl Acad Sci. 2012;109(37):14942–14947. doi: 10.1073/pnas.1211733109. PubMed DOI PMC

Morel B, Barbera P, Czech L, Bettisworth B, Höbner L, Lutteropp S, Serdari D, Kostaki E-G, Mamais I, Kozlov AM, Pavlidis P, Paraskevis D, Stamatakis A. Phylogenetic analysis of SARS-CoV-2 data is difficult. Mol Biol Evol. 2020 doi: 10.1093/molbev/msaa314. PubMed DOI PMC

Zhu T, Yang Z. Complexity of the simplest species tree problem. Mol Biol Evol. 2021 doi: 10.1093/molbev/msab009. PubMed DOI PMC

Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics. 2014;30(17):541–548. doi: 10.1093/bioinformatics/btu462. PubMed DOI PMC

Chifman J, Kubatko L. Quartet inference from SNP data under the coalescent model. Bioinformatics. 2014;30(23):3317–24. doi: 10.1093/bioinformatics/btu530. PubMed DOI PMC

Webb CO, Slik JF, Triono T. Biodiversity inventory and informatics in Southeast Asia. Biodiver Conserv. 2010;19(4):955–972. doi: 10.1007/s10531-010-9817-x. DOI

San Mauro D, Agorreta A. Molecular systematics: a synthesis of the common methods and the state of knowledge. Cell Mol Biol Lett. 2010;15(2):311. doi: 10.2478/s11658-010-0010-8. PubMed DOI PMC

Helmus MR, Ives AR. Phylogenetic diversity-area curves. Ecology. 2012;93(sp8):31–43. doi: 10.1890/11-0435.1. DOI

Stoltzfus A, Lapp H, Matasci N, Deus H, Sidlauskas B, Zmasek CM, Vaidya G, Pontelli E, Cranston K, Vos R, et al. Phylotastic! making tree-of-life knowledge accessible, reusable and convenient. BMC Bioinform. 2013;14(1):158. doi: 10.1186/1471-2105-14-158. PubMed DOI PMC

OpenTreeOfLife, Redelings B, Reyes LLS, Cranston KA, Allman J, Holder MT, McTavish EJ. Open tree of life synthetic subtree of the genus Ilex, Node Id Ott727571. https://tree.opentreeoflife.org/opentree/opentree12.3@ott727571/Ilex.

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...