PREMISE: Custom probe design for target enrichment in phylogenetics is tedious and often hinders broader phylogenetic synthesis. The universal angiosperm probe set Angiosperms353 may be the solution. Here, we test the relative performance of Angiosperms353 on the Rosaceae subtribe Malinae in comparison with custom probes that we specifically designed for this clade. We then address the impact of bioinformatically altering the performance of Angiosperms353 by replacing the original probe sequences with orthologs extracted from the Malus domestica genome. METHODS: To evaluate the relative performance of these probe sets, we compared the enrichment efficiency, locus recovery, alignment length, proportion of parsimony-informative sites, proportion of potential paralogs, the topology and support of the resulting species trees, and the gene tree discordance. RESULTS: Locus recovery was highest for our custom Malinae probe set, and replacing the original Angiosperms353 sequences with a Malus representative improved the locus recovery relative to Angiosperms353. The proportion of parsimony-informative sites was similar between all probe sets, while the gene tree discordance was lower in the case of the custom probes. DISCUSSION: A custom probe set benefits from data completeness and can be tailored toward the specificities of the project of choice; however, Angiosperms353 was equally as phylogenetically informative as the custom probes. We therefore recommend using both a custom probe set and Angiosperms353 to facilitate large-scale systematic studies, where financially possible.
- Keywords
- Angiosperms353, Malinae, customized probe set, target enrichment, universal probe set,
- Publication type
- Journal Article MeSH
The analysis of target enrichment data in phylogenetics lacks optimization toward using paralogues for phylogenetic reconstruction. We developed a novel approach of detecting paralogues and utilizing them for phylogenetic tree inference, by retrieving both ortho- and paralogous copies and creating orthologous alignments, from which the gene trees are built. We implemented this approach in ParalogWizard and demonstrate its performance in plant groups that underwent a whole genome duplication relatively recently: the subtribe Malinae (family Rosaceae), using Angiosperms353 as well as Malinae481 probes, the genus Oritrophium (family Asteraceae), using Compositae1061 probes, and the genus Amomum (family Zingiberaceae), using Zingiberaceae1180 probes. Discriminating between orthologues and paralogues reduced gene tree discordance and increased the species tree support in the case of the Malinae, but not for Oritrophium and Amomum. This may relate to the difference in the proportion of paralogous loci between the data sets, which was highest for the Malinae. Overall, retrieving paralogues for phylogenetic reconstruction following ParalogWizard has the potential to increase the species tree support and reduce gene tree discordance in target enrichment data, particularly if the proportion of paralogous loci is high.
- Keywords
- angiosperms, bioinfomatics/phyloinfomatics, paralogy, species tree,
- MeSH
- Phylogeny MeSH
- Genome * MeSH
- Publication type
- Journal Article MeSH