HybPhyloMaker: Target Enrichment Data Analysis From Raw Reads to Species Trees
Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
29348708
PubMed Central
PMC5768271
DOI
10.1177/1176934317742613
PII: 10.1177_1176934317742613
Knihovny.cz E-zdroje
- Klíčová slova
- Target enrichment, genome skimming, locus selection, phylogenomics, species tree,
- Publikační typ
- časopisecké články MeSH
SUMMARY: Hybridization-based target enrichment in combination with genome skimming (Hyb-Seq) is becoming a standard method of phylogenomics. We developed HybPhyloMaker, a bioinformatics pipeline that performs target enrichment data analysis from raw reads to supermatrix-, supertree-, and multispecies coalescent-based species tree reconstruction. HybPhyloMaker is written in BASH and integrates common bioinformatics tools. It can be launched both locally and on a high-performance computer cluster. Compared with existing target enrichment data analysis pipelines, HybPhyloMaker offers the following main advantages: implementation of all steps of data analysis from raw reads to species tree reconstruction, calculation and summary of alignment and gene tree properties that assist the user in the selection of "quality-filtered" genes, implementation of several species tree reconstruction methods, and analysis of the coding regions of organellar genomes. AVAILABILITY: The HybPhyloMaker scripts, manual as well as a test data set, are available in https://github.com/tomas-fer/HybPhyloMaker/. HybPhyloMaker is licensed under open-source license GPL v.3 allowing further modifications.
Department of Botany Faculty of Science Charles University Prague Czech Republic
Institute of Botany Czech Academy of Sciences Průhonice Czech Republic
Zobrazit více v PubMed
Mandel JR, Dikow RB, Funk VA, et al. A target enrichment method for gathering phylogenetic information from hundreds of loci: an example from the Compositae. Appl Plant Sci. 2014;2:1300085. PubMed PMC
Weitemier K, Straub SCK, Cronn RC, et al. Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics. Appl Plant Sci. 2014;2:1400042. PubMed PMC
Nicholls JA, Pennington RT, Koenen EJ, et al. Using targeted enrichment of nuclear genes to increase phylogenetic resolution in the neotropical rain forest genus Inga (Leguminosae: Mimosoideae). Front Plant Sci. 2015;6:710. PubMed PMC
Lemmon EM, Lemmon AR. High-throughput genomic data in systematics and phylogenetics. Ann Rev Ecol Evol Syst. 2013;44:99–121.
Heyduk K, Stephens JD, Faircloth BC, Glenn TC. Targeted DNA region re-sequencing. In: Aransay AM, Lavín Trueba JL, eds. Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing. Berlin, Germany: Springer; 2016:43–68.
Faircloth BC. PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics. 2016;32:786–788. PubMed
Johnson MG, Gardner EM, Liu Y, et al. HybPiper: extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Appl Plant Sci. 2016;4:1600016. PubMed PMC
Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. Ultraconserved elements anchor thousands of genetic markers for target enrichment spanning multiple evolutionary timescales. Syst Biol. 2012;61:717–726. PubMed
Faircloth BC, Sorenson L, Santini F, Alfaro ME. A phylogenomic perspective on the radiation of ray-finned fishes based upon targeted sequencing of ultraconserved elements (UCEs). PLoS ONE. 2013;8:e65923. PubMed PMC
Reneker J, Lyons E, Conant GC, et al. Long identical multispecies elements in plant and animal genomes. Proc Natl Acad Sci U S A. 2012;109:E1183–1191. PubMed PMC
Schmickl R, Liston A, Zeisek V, et al. Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae). Mol Ecol Resour. 2016;16:1124–1135. PubMed
Stephens JD, Rogers WL, Mason CM, Donovan LA, Malmberg RL. Species tree estimation of diploid Helianthus (Asteraceae) using target enrichment. Amer J Bot. 2015;102:910–920. PubMed
Folk RA, Mandel JR, Freudenstein JV. A protocol for targeted enrichment of intron-containing sequence markers for recent radiations: a phylogenomic example with genomic resources from Heuchera (Saxifragaceae). Appl Plant Sci. 2015;3:1500039. PubMed PMC
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. PubMed PMC
Li H, Handsaker B, Wysoker A, et al; 1000. Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. PubMed PMC
HudsonAlpha. bam2fastq. http://gsl.hudsonalpha.org/information/software/bam2fastq/, 2010. Accessed June 20, 2017.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. PubMed PMC
Xu H, Luo X, Qian J, et al. FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS ONE. 2012;7:e52249. PubMed PMC
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009;25:1754–1760. PubMed PMC
Břinda K, Boeva V, Kucherov G. Dynamic read mapping and online consensus calling for better variant detection. arXiv.org. 2016:160509070. https://arxiv.org/abs/1605.09070.
Constantinides B. Kindel: indel-aware consensus calling. https://github.com/bede/kindel, 2017. Accessed June 20, 2017.
Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–664. PubMed PMC
Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008;9:286–298. PubMed
Nylander J. catfasta2phyml. https://github.com/nylander/catfasta2phyml/, 2016. Accessed June 20, 2017.
Borowiec ML. AMAS: a fast tool for alignment manipulation and computing of summary statistics. Peer J. 2016;4:e1660. PubMed PMC
Collet G. MstatX. https://github.com/gcollet/MstatX/, 2012. Accessed June 20, 2017.
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. PubMed PMC
Price MN, Dehal PS, Arkin AP. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5:e9490. PubMed PMC
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. PubMed PMC
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–321. PubMed
Borowiec M. good_genes. https://github.com/marekborowiec/good_genes, 2016. Accessed June 20, 2017.
Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–290. PubMed
Charif D, Lobry JR. SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. In: Bastolla U, Porto M, Roman HE, Vendruscolo M, eds. Structural Approaches to Sequence Evolution (Series Biological and Medical Physics, Biomedical Engineering). Berlin, Germany: Springer; 2007:207–232.
Philippe H, Forterre P. The rooting of the universal tree of life is not reliable. J Mol Evol. 1999;49:509–523. PubMed
Junier T, Zdobnov EM. The Newick Utilities: high-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics. 2010;26:1669–1670. PubMed PMC
Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics. 2014;30:i541–i548. PubMed PMC
Vachaspati P, Warnow T. ASTRID: Accurate Species TRees from Internode Distances. BMC Genomics. 2015;16:S3 PubMed PMC
Nguyen N, Mirarab S, Warnow T. MRL and SuperFine+MRL: new supertree methods. Algorithms Mol Biol. 2012;7:3. PubMed PMC
Kozlov AM, Aberer AJ, Stamatakis A. ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics. 2015;31:2577–2579. PubMed PMC
Lanfear R, Calcott B, Ho SY, Guindon S. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol. 2012;29:1695–1701. PubMed
Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol Biol. 2014;14:82. PubMed PMC
Kearse M, Moir R, Wilson A, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–1649. PubMed PMC
Faircloth B. Illumiprocessor: parallel adapter and quality trimming. https://illumiprocessor.readthedocs.io/en/latest/, 2013. Accessed June 20, 2017.
Petersen KR, Streett DA, Gerritsen AT, Hunter SS, Settles ML. Super deduper, fast PCR duplicate detection in fastq files. Paper presented at: Proceedings of the 6th ACM Conference On Bioinformatics, Computational Biology and Health Informatics; September 9-12, 2015; Atlanta, GA:491–492. New York, NY: ACM.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol İ. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–1123. PubMed PMC
Grabherr MG, Haas BJ, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–652. PubMed PMC
Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–477. PubMed PMC
Aberer AJ, Kobert K, Stamatakis A. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol. 2014;31:2553–2556. PubMed PMC
Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. PubMed PMC
Harris RS. Improved Pairwise Alignment of Genomic DNA [PhD thesis]. State College, PA: The Pennsylvania State University; 2007.
Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. PubMed PMC
Doyle JJ, Doyle JL, Rauscher JT, Brown AHD. Diploid and polyploid reticulate evolution throughout the history of the perennial soybeans (Glycine subgenus Glycine). New Phytol. 2004;161:121–132.
Kamneva OK, Syring J, Liston A, Rosenberg NA. Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing. BMC Evol Biol. 2017;17:180. PubMed PMC