HybPhyloMaker: Target Enrichment Data Analysis From Raw Reads to Species Trees

. 2018 ; 14 () : 1176934317742613. [epub] 20180112

Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid29348708
Odkazy

PubMed 29348708
PubMed Central PMC5768271
DOI 10.1177/1176934317742613
PII: 10.1177_1176934317742613
Knihovny.cz E-zdroje

SUMMARY: Hybridization-based target enrichment in combination with genome skimming (Hyb-Seq) is becoming a standard method of phylogenomics. We developed HybPhyloMaker, a bioinformatics pipeline that performs target enrichment data analysis from raw reads to supermatrix-, supertree-, and multispecies coalescent-based species tree reconstruction. HybPhyloMaker is written in BASH and integrates common bioinformatics tools. It can be launched both locally and on a high-performance computer cluster. Compared with existing target enrichment data analysis pipelines, HybPhyloMaker offers the following main advantages: implementation of all steps of data analysis from raw reads to species tree reconstruction, calculation and summary of alignment and gene tree properties that assist the user in the selection of "quality-filtered" genes, implementation of several species tree reconstruction methods, and analysis of the coding regions of organellar genomes. AVAILABILITY: The HybPhyloMaker scripts, manual as well as a test data set, are available in https://github.com/tomas-fer/HybPhyloMaker/. HybPhyloMaker is licensed under open-source license GPL v.3 allowing further modifications.

Zobrazit více v PubMed

Mandel JR, Dikow RB, Funk VA, et al. A target enrichment method for gathering phylogenetic information from hundreds of loci: an example from the Compositae. Appl Plant Sci. 2014;2:1300085. PubMed PMC

Weitemier K, Straub SCK, Cronn RC, et al. Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics. Appl Plant Sci. 2014;2:1400042. PubMed PMC

Nicholls JA, Pennington RT, Koenen EJ, et al. Using targeted enrichment of nuclear genes to increase phylogenetic resolution in the neotropical rain forest genus Inga (Leguminosae: Mimosoideae). Front Plant Sci. 2015;6:710. PubMed PMC

Lemmon EM, Lemmon AR. High-throughput genomic data in systematics and phylogenetics. Ann Rev Ecol Evol Syst. 2013;44:99–121.

Heyduk K, Stephens JD, Faircloth BC, Glenn TC. Targeted DNA region re-sequencing. In: Aransay AM, Lavín Trueba JL, eds. Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing. Berlin, Germany: Springer; 2016:43–68.

Faircloth BC. PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics. 2016;32:786–788. PubMed

Johnson MG, Gardner EM, Liu Y, et al. HybPiper: extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Appl Plant Sci. 2016;4:1600016. PubMed PMC

Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. Ultraconserved elements anchor thousands of genetic markers for target enrichment spanning multiple evolutionary timescales. Syst Biol. 2012;61:717–726. PubMed

Faircloth BC, Sorenson L, Santini F, Alfaro ME. A phylogenomic perspective on the radiation of ray-finned fishes based upon targeted sequencing of ultraconserved elements (UCEs). PLoS ONE. 2013;8:e65923. PubMed PMC

Reneker J, Lyons E, Conant GC, et al. Long identical multispecies elements in plant and animal genomes. Proc Natl Acad Sci U S A. 2012;109:E1183–1191. PubMed PMC

Schmickl R, Liston A, Zeisek V, et al. Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae). Mol Ecol Resour. 2016;16:1124–1135. PubMed

Stephens JD, Rogers WL, Mason CM, Donovan LA, Malmberg RL. Species tree estimation of diploid Helianthus (Asteraceae) using target enrichment. Amer J Bot. 2015;102:910–920. PubMed

Folk RA, Mandel JR, Freudenstein JV. A protocol for targeted enrichment of intron-containing sequence markers for recent radiations: a phylogenomic example with genomic resources from Heuchera (Saxifragaceae). Appl Plant Sci. 2015;3:1500039. PubMed PMC

Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. PubMed PMC

Li H, Handsaker B, Wysoker A, et al; 1000. Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. PubMed PMC

HudsonAlpha. bam2fastq. http://gsl.hudsonalpha.org/information/software/bam2fastq/, 2010. Accessed June 20, 2017.

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. PubMed PMC

Xu H, Luo X, Qian J, et al. FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS ONE. 2012;7:e52249. PubMed PMC

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009;25:1754–1760. PubMed PMC

Břinda K, Boeva V, Kucherov G. Dynamic read mapping and online consensus calling for better variant detection. arXiv.org. 2016:160509070. https://arxiv.org/abs/1605.09070.

Constantinides B. Kindel: indel-aware consensus calling. https://github.com/bede/kindel, 2017. Accessed June 20, 2017.

Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–664. PubMed PMC

Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008;9:286–298. PubMed

Nylander J. catfasta2phyml. https://github.com/nylander/catfasta2phyml/, 2016. Accessed June 20, 2017.

Borowiec ML. AMAS: a fast tool for alignment manipulation and computing of summary statistics. Peer J. 2016;4:e1660. PubMed PMC

Collet G. MstatX. https://github.com/gcollet/MstatX/, 2012. Accessed June 20, 2017.

Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. PubMed PMC

Price MN, Dehal PS, Arkin AP. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5:e9490. PubMed PMC

Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. PubMed PMC

Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–321. PubMed

Borowiec M. good_genes. https://github.com/marekborowiec/good_genes, 2016. Accessed June 20, 2017.

Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–290. PubMed

Charif D, Lobry JR. SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. In: Bastolla U, Porto M, Roman HE, Vendruscolo M, eds. Structural Approaches to Sequence Evolution (Series Biological and Medical Physics, Biomedical Engineering). Berlin, Germany: Springer; 2007:207–232.

Philippe H, Forterre P. The rooting of the universal tree of life is not reliable. J Mol Evol. 1999;49:509–523. PubMed

Junier T, Zdobnov EM. The Newick Utilities: high-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics. 2010;26:1669–1670. PubMed PMC

Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics. 2014;30:i541–i548. PubMed PMC

Vachaspati P, Warnow T. ASTRID: Accurate Species TRees from Internode Distances. BMC Genomics. 2015;16:S3 PubMed PMC

Nguyen N, Mirarab S, Warnow T. MRL and SuperFine+MRL: new supertree methods. Algorithms Mol Biol. 2012;7:3. PubMed PMC

Kozlov AM, Aberer AJ, Stamatakis A. ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics. 2015;31:2577–2579. PubMed PMC

Lanfear R, Calcott B, Ho SY, Guindon S. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol. 2012;29:1695–1701. PubMed

Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol Biol. 2014;14:82. PubMed PMC

Kearse M, Moir R, Wilson A, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–1649. PubMed PMC

Faircloth B. Illumiprocessor: parallel adapter and quality trimming. https://illumiprocessor.readthedocs.io/en/latest/, 2013. Accessed June 20, 2017.

Petersen KR, Streett DA, Gerritsen AT, Hunter SS, Settles ML. Super deduper, fast PCR duplicate detection in fastq files. Paper presented at: Proceedings of the 6th ACM Conference On Bioinformatics, Computational Biology and Health Informatics; September 9-12, 2015; Atlanta, GA:491–492. New York, NY: ACM.

Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol İ. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–1123. PubMed PMC

Grabherr MG, Haas BJ, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–652. PubMed PMC

Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–477. PubMed PMC

Aberer AJ, Kobert K, Stamatakis A. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol. 2014;31:2553–2556. PubMed PMC

Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. PubMed PMC

Harris RS. Improved Pairwise Alignment of Genomic DNA [PhD thesis]. State College, PA: The Pennsylvania State University; 2007.

Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. PubMed PMC

Doyle JJ, Doyle JL, Rauscher JT, Brown AHD. Diploid and polyploid reticulate evolution throughout the history of the perennial soybeans (Glycine subgenus Glycine). New Phytol. 2004;161:121–132.

Kamneva OK, Syring J, Liston A, Rosenberg NA. Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing. BMC Evol Biol. 2017;17:180. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...