PhyloFisher: A phylogenomic package for resolving eukaryotic relationships
Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection
Typ dokumentu hodnotící studie, časopisecké články, práce podpořená grantem, Research Support, U.S. Gov't, Non-P.H.S.
PubMed
34358228
PubMed Central
PMC8345874
DOI
10.1371/journal.pbio.3001365
PII: PBIOLOGY-D-20-02379
Knihovny.cz E-zdroje
- MeSH
- Eukaryota genetika MeSH
- fylogeneze * MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- hodnotící studie MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.
Department of Biochemistry and Molecular Biology Dalhousie University Halifax Canada
Department of Biological Sciences Vanderbilt University Nashville Tennessee United States of America
Department of Biology and Ecology Faculty of Science University of Ostrava Ostrava Czech Republic
Department of Organismal Biology Uppsala University Uppsala Sweden
Faculty of Science University of South Bohemia České Budějovice Czech Republic
Institute of Parasitology Biology Centre Czech Academy of Sciences České Budějovice Czech Republic
Leibniz Institute of Freshwater Ecology and Inland Fisheries Ecosystem Research Berlin Germany
Science for Life Laboratory Uppsala University Uppsala Sweden
Unité d'Ecologie Systématique et Evolution CNRS Université Paris Saclay Paris France
Zobrazit více v PubMed
Leipe DD, Gunderson JH, Nerad TA, Sogin ML. Small subunit ribosomal RNA+ of Hexamita inflata and the quest for the first branch in the eukaryotic tree. Mol Biochem Parasitol. 1993;59:41–48. doi: 10.1016/0166-6851(93)90005-i PubMed DOI
Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF. A Kingdom-Level Phylogeny of Eukaryotes Based on Combined Protein Data. Science. 2000;290:972. doi: 10.1126/science.290.5493.972 PubMed DOI
Brown MW, Heiss AA, Kamikawa R, Inagaki Y, Yabuki A, Tice AK, et al.. Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group. Genome Biol Evol. 2018;10:427–433. doi: 10.1093/gbe/evy014 PubMed DOI PMC
Strassert JFH, Jamy M, Mylnikov AP, Tikhonenkov DV, Burki F. New Phylogenomic Analysis of the Enigmatic Phylum Telonemia Further Resolves the Eukaryote Tree of Life. Mol Biol Evol. 2019;36:757–765. doi: 10.1093/molbev/msz012 PubMed DOI PMC
Lax G, Eglit Y, Eme L, Bertrand EM, Roger AJ, Simpson AGB. Hemimastigophora is a novel supra-kingdom-level lineage of eukaryotes. Nature. 2018;564:410–414. doi: 10.1038/s41586-018-0708-8 PubMed DOI
Yang Y, Smith SA. Orthology Inference in Nonmodel Organisms Using Transcriptomes and Low-Coverage Genomes: Improving Accuracy and Matrix Occupancy for Phylogenomics. Mol Biol Evol. 2014;31:3081–3092. doi: 10.1093/molbev/msu245 PubMed DOI PMC
Kumar S, Krabberød AK, Neumann RS, Michalickova K, Zhao S, Zhang X, et al.. BIR Pipeline for Preparation of Phylogenomic Data. Evol Bioinform Online. 2015;11:EBO.S10189. doi: 10.4137/EBO.S10189 PubMed DOI PMC
Salomaki ED, Terpis KX, Rueckert S, Kotyk M, Varadínová ZK, Čepička I, et al.. Gregarine single-cell transcriptomics reveals differential mitochondrial remodeling and adaptation in apicomplexans. BMC Biol. 2021;19:77. doi: 10.1186/s12915-021-01007-2 PubMed DOI PMC
Zhang C, Rabiee M, Sayyari E, Mirarab S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics. 2018;19:153. doi: 10.1186/s12859-018-2129-y PubMed DOI PMC
Susko E, Field C, Blouin C, Roger AJ. Estimation of Rates-Across-Sites Distributions in Phylogenetic Substitution Models. Syst Biol. 2003;52:594–603. doi: 10.1080/10635150390235395 PubMed DOI
Susko E, Lincker L, Roger AJ. Accelerated Estimation of Frequency Classes in Site-Heterogeneous Profile Mixture Models. Mol Biol Evol. 2018;35:1266–1283. doi: 10.1093/molbev/msy026 PubMed DOI
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033 PubMed DOI PMC
Susko E, Roger AJ. On Reduced Amino Acid Alphabets for Phylogenetic Inference. Mol Biol Evol. 2007;24:2139–2150. doi: 10.1093/molbev/msm144 PubMed DOI
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al.. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015 PubMed DOI PMC
Burki F, Roger AJ, Brown MW, Simpson AGB. The New Tree of Eukaryotes. Trends Ecol Evol. 2020;35:43–55. doi: 10.1016/j.tree.2019.08.008 PubMed DOI
Gawryluk RMR, Tikhonenkov DV, Hehenberger E, Husnik F, Mylnikov AP, Keeling PJ. Non-photosynthetic predators are sister to red algae. Nature. 2019;572:240–243. doi: 10.1038/s41586-019-1398-6 PubMed DOI
Irisarri I, Strassert JFH, Burki F. Phylogenomic Insights into the Origin of Primary Plastids. Syst Biol. 2021. [cited 20 May 2021]. doi: 10.1093/sysbio/syab036 PubMed DOI
Schön ME, Zlatogursky VV, Singh RP, Poirier C, Wilken S, Mathur V, et al.. Picozoa are archaeplastids without plastid. bioRxiv. 2021:2021.04.14.439778. doi: 10.1101/2021.04.14.439778 DOI
Cavalier-Smith T, Chao EE, Lewis R. Multigene phylogeny and cell evolution of chromist infrakingdom Rhizaria: contrasting cell organisation of sister phyla Cercozoa and Retaria. Protoplasma. 2018;255:1517–1574. doi: 10.1007/s00709-018-1241-1 PubMed DOI PMC
Shen X-X, Opulente DA, Kominek J, Zhou X, Steenwyk JL, Buh KV, et al.. Tempo and Mode of Genome Evolution in the Budding Yeast Subphylum. Cell. 2018;175:1533–1545.e20. doi: 10.1016/j.cell.2018.10.023 PubMed DOI PMC
Seenivasan R, Sausen N, Medlin LK, Melkonian M. Picomonas judraskeda gen. et sp. nov.: the first identified member of the Picozoa phylum nov., a widespread group of picoeukaryotes, formerly known as “picobiliphytes”. PLoS ONE. 2013;8:e59565. doi: 10.1371/journal.pone.0059565 PubMed DOI PMC
Siu-Ting K, Torres-Sánchez M, San Mauro D, Wilcockson D, Wilkinson M, Pisani D, et al.. Inadvertent Paralog Inclusion Drives Artifactual Topologies and Timetree Estimates in Phylogenomics. Mol Biol Evol. 2019;36:1344–1356. doi: 10.1093/molbev/msz067 PubMed DOI PMC
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565 PubMed DOI PMC
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41:e121–e121. doi: 10.1093/nar/gkt263 PubMed DOI PMC
Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176 PubMed DOI
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al.. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421 PubMed DOI PMC
Chen F, Mackey AJ, Stoeckert CJ Jr, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–D368. doi: 10.1093/nar/gkj123 PubMed DOI PMC
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010 PubMed DOI PMC
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348 PubMed DOI PMC
Price MN, Dehal PS, Arkin AP. FastTree 2—Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE. 2010;5:1–10. doi: 10.1371/journal.pone.0009490 PubMed DOI PMC
Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol. 2016;33:1635–1638. doi: 10.1093/molbev/msw046 PubMed DOI PMC
Whelan S, Irisarri I, Burki F. PREQUAL: detecting non-homologous characters in sets of unaligned homologous sequences. Bioinformatics. 2018;34:3929–3930. doi: 10.1093/bioinformatics/bty448 PubMed DOI
Ali RH, Bogusz M, Whelan S. Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments. Mol Biol Evol 2019;36:2340–2351. doi: 10.1093/molbev/msz142 PubMed DOI PMC
Criscuolo A, Gribaldo S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. 2010;10:210. doi: 10.1186/1471-2148-10-210 PubMed DOI PMC
Song L, Florea L. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads. Gigascience. 2015;4. doi: 10.1186/s13742-015-0089-y PubMed DOI PMC
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30. doi: 10.1093/bioinformatics/btu170 PubMed DOI PMC
Tice AK, Shadwick LL, Fiore-Donno AM, Geisen S, Kang S, Schuler GA, et al.. Expansion of the molecular and morphological diversity of Acanthamoebidae (Centramoebida, Amoebozoa) and identification of a novel life cycle type within the group. Biol Direct. 2016;11:69. doi: 10.1186/s13062-016-0171-0 PubMed DOI PMC
Wang H-C, Minh BQ, Susko E, Roger AJ. Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation. Syst Biol. 2017;67:216–235. doi: 10.1093/sysbio/syx068 PubMed DOI
Mapping the metagenomic diversity of the multi-kingdom glacier-fed stream microbiome
Reconstructing the last common ancestor of all eukaryotes
New plastids, old proteins: repeated endosymbiotic acquisitions in kareniacean dinoflagellates
Mitochondrial genomes revisited: why do different lineages retain different genes?
Create, Analyze, and Visualize Phylogenomic Datasets Using PhyloFisher
Genomics of Preaxostyla Flagellates Illuminates the Path Towards the Loss of Mitochondria
Lessons from the deep: mechanisms behind diversification of eukaryotic protein complexes
Evidence for an Independent Hydrogenosome-to-Mitosome Transition in the CL3 Lineage of Fornicates
An Enigmatic Stramenopile Sheds Light on Early Evolution in Ochrophyta Plastid Organellogenesis
Phylogenetic profiling and cellular analyses of ARL16 reveal roles in traffic of IFT140 and INPP5E