Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding
Status PubMed-not-MEDLINE Jazyk angličtina Země Bulharsko Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
30271256
PubMed Central
PMC6160831
DOI
10.3897/mycokeys.39.28109
Knihovny.cz E-zdroje
- Klíčová slova
- Microbial communities, amplicon sequencing, fungal biodiversity, metagenomics, microbiome, mycobiome,
- Publikační typ
- časopisecké články MeSH
Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.
Department of Ecology Swedish University of Agricultural Sciences Ulls väg 16 756 51 Uppsala Sweden
Institute of Ecology and Earth Science Tartu University 14a Ravila 50411 Tartu Estonia
Natural History Museum of Tartu University 14a Ravila 50411 Tartu Estonia
Technical University of Munich Am Coulombwall 3 85748 Garching Germany
Zobrazit více v PubMed
Abarenkov K, Nilsson RH, Larsson K-H, Alexander IJ, Eberhardt U, Erland S, Hoiland K, Kjoller R, Larsson E, Pennanen T, Sen R, Taylor AFS, Tedersoo L, Ursing BM, Vralstad T, Liimatainen K, Peintner U, Kõljalg U. (2010a) The UNITE database for molecular identification of fungi – recent updates and future perspectives. New Phytologist 186: 281–285. 10.1111/j.1469-8137.2009.03160.x PubMed DOI
Abarenkov K, Tedersoo L, Nilsson RH, Vellak K, Saar I, Veldre V, Parmasto E, Prous M, Aan A, Ots M, Kurina O, Ostonen I, Jogeva J, Halapuu S, Poldmaa K, Toots M, Truu J, Larsson K-H, Koljalg U. (2010b) PlutoF-a Web Based Workbench for Ecological and Taxonomic Research, with an Online Implementation for Fungal ITS Sequences. Evolutionary Bioinformatics 6: 189–196. 10.4137/ebo.s6271 DOI
Afgan E, Baker D, Van den Beek M, Blankenberg D, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Eberhard C. (2016) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Research 44: 3–10. 10.1093/nar/gkw343 PubMed DOI PMC
Anderson MJ, Walsh DCI. (2013) PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecological Monographs 83: 557–574. 10.1890/12-2010.1 DOI
Anslan S, Bahram M, Hiiesalu I, Tedersoo L. (2017) PipeCraft: flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data. Molecular Ecology Resources 17: e234–e240. 10.1111/1755-0998.12692 PubMed DOI
Anslan S, Bahram M, Tedersoo L. (2018) Seasonal and annual variation in fungal communities associated with epigeic springtails (Collembola spp.) in boreal forests. Soil Biology and Biochemistry 116: 245–252. doi:10.1016/j.soilbio.2017.10.021 DOI
Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, De Wit P, Sanchez-Garcia M, Ebersberger I, de Sousa F, Amend AS, Jumpponen A, Unterseher M, Kristiansson E, Abarenkov K, Bertrand YJK, Sanli K, Eriksson KM, Vik U, Veldre V, Nilsson RH. (2013) Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution 4: 914–919. 10.1111/2041-210x.12073 DOI
Blankenberg D, Gordon A, Von Kuster G, Coraor N, Taylor J, Nekrutenko A, Team G. (2010) Manipulation of FASTQ data with Galaxy. Bioinformatics 26: 1783–1785. 10.1093/bioinformatics/btq281 PubMed DOI PMC
Bolger AM, Lohse M, Usadel B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. 10.1093/bioinformatics/btu170 PubMed DOI PMC
Brown SP, Veach AM, Rigdon-Huss AR, Grond K, Lickteig SK, Lothamer K, Oliver AK, Jumpponen A. (2015) Scraping the bottom of the barrel: are rare high throughput sequences artifacts? Fungal Ecology 13: 221–225. 10.1016/j.funeco.2014.08.006 DOI
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. (2016) DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods 13: 581. 10.1038/nmeth.3869 PubMed DOI PMC
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. (2009) BLAST+: architecture and applications. BMC Bioinformatics 10: 421. 10.1186/1471-2105-10-421 PubMed DOI PMC
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI. (2010) QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7: 335–336. 10.1038/nmeth.f.303 PubMed DOI PMC
Clarke K, Gorley R. (2006) PRIMER V6: User Manual / Tutorial. Primer-E Ltd, Plymouth, 192 pp.
Cline LC, Song Z, Al‐Ghalith GA, Knights D, Kennedy PG. (2017) Moving beyond de novo clustering in fungal community ecology. New Phytol. 216(3): 629–634. 10.1111/nph.14752 PubMed DOI
Deshpande V, Wang Q, Greenfield P, Charleston M, Porras-Alfaro A, Kuske CR, Cole JR, Midgley DJ, Tran-Dinh N. (2016) Fungal identification using a Bayesian classifier and the Warcup training set of internal transcribed spacer sequences. Mycologia 108: 1–5. 10.3852/14-293 PubMed DOI
Edgar RC. (2013) UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature Methods 10. 10.1038/nmeth.2604 PubMed DOI
Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194–2200. 10.1093/bioinformatics/btr381 PubMed DOI PMC
Frøslev TG, Kjøller R, Bruun HH, Ejrnæs R, Brunbjerg AK, Pietroni C, Hansen AJ. (2017) Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature communications 8: 1188. 10.1038/s41467-017-01312-x PubMed DOI PMC
Fu L, Niu B, Zhu Z, Wu S, Li W. (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28: 3150–3152. 10.1093/bioinformatics/bts565 PubMed DOI PMC
Grossart H-P, Wurzbacher C, James TY, Kagami M. (2016) Discovery of dark matter fungi in aquatic ecosystems demands a reappraisal of the phylogeny and ecology of zoosporic fungi. Fungal Ecology 19: 28–38. doi:10.1016/j.funeco.2015.06.004 DOI
Gweon HS, Oliver A, Taylor J, Booth T, Gibbs M, Read DS, Griffiths RI, Schonrogge K. (2015) PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform. Methods in Ecology and Evolution 6: 973–980. 10.1111/2041-210x.12399 PubMed DOI PMC
Hibbett D, Abarenkov K, Koljalg U, Opik M, Chai B, Cole JR, Wang Q, Crous PW, Robert VARG, Helgason T, Herr J, Kirk P, Lueschow S, O’Donnell K, Nilsson H, Oono R, Schoch CL, Smyth C, Walker D, Porras-Alfaro A, Taylor JW, Geiser DM. (2017) Sequence-based classification and identification of Fungi. Mycologia 108: 1049–1068 PubMed
Hildebrand F, Tadeo R, Voigt AY, Bork P, Raes J. (2014) LotuS: an efficient and user-friendly OTU processing pipeline. Microbiome 2: 30. 10.1186/2049-2618-2-30 PubMed DOI PMC
Lücking R, Kirk PM, Hawksworth DL. (2018) Sequence-based nomenclature: a reply to Thines et al. and Zamora et al. and provisions for an amended proposal. IMA fungus 9: 185–198. 10.5598/imafungus.2018.09.01.12 PubMed DOI PMC
Majaneva M, Hyytiäinen K, Varvio SL, Nagai S, Blomster J. (2015) Bioinformatic amplicon read processing strategies strongly affect eukaryotic diversity and the taxonomic composition of communities. PLoS ONE 10: e0130035. 10.1371/journal.pone.0130035 PubMed DOI PMC
R-Core-Team (2015) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.
Rognes T, Flouri T, Nichols B, Quince C, Mahé F. (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4: e2584. 10.7717/peerj.2584 PubMed DOI PMC
Saary P, Forslund K, Bork P, Hildebrand F. (2017) RTK: efficient rarefaction analysis of large datasets. Bioinformatics 33: 2594–2595. 10.1093/bioinformatics/btx206 PubMed DOI PMC
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. (2009) Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Applied and Environmental Microbiology 75: 7537–7541. 10.1128/aem.01541-09 PubMed DOI PMC
Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Bolchacova E, Voigt K, Crous PW, Miller AN, Wingfield MJ, Aime MC, An KD, Bai FY, Barreto RW, Begerow D, Bergeron MJ, Blackwell M, Boekhout T, Bogale M, Boonyuen N, Burgaz AR, Buyck B, Cai L, Cai Q, Cardinali G, Chaverri P, Coppins BJ, Crespo A, Cubas P P, Cummings C, Damm U, de Beer ZW, de Hoog GS, Del-Prado R, Dentinger B, Dieguez-Uribeondo J, Divakar PK, Douglas B, Duenas M, Duong TA, Eberhardt U, Edwards JE, Elshahed MS, Fliegerova K, Furtado M, Garcia MA, Ge ZW, Griffith GW, Griffiths K, Groenewald JZ, Groenewald M, Grube M, Gryzenhout M, Guo LD, Hagen F, Hambleton S, Hamelin RC, Hansen K, Harrold P, Heller G, Herrera G, Hirayama K, Hirooka Y, Ho HM, Hoffmann K, Hofstetter V, Hognabba F, Hollingsworth PM, Hong SB, Hosaka K, Houbraken J, Hughes K, Huhtinen S, Hyde KD, James T, Johnson EM, Johnson JE, Johnston PR, Jones EB, Kelly LJ, Kirk PM, Knapp DG, Koljalg U, Kovacs GM, Kurtzman CP, Landvik S, Leavitt SD, Liggenstoffer AS, Liimatainen K, Lombard L, Luangsa-Ard JJ, Lumbsch HT, Maganti H, Maharachchikumbura SS, Martin MP, May TW, McTaggart AR, Methven AS, Meyer W, Moncalvo JM, Mongkolsamrit S, Nagy LG, Nilsson RH, Niskanen T, Nyilasi I, Okada G, Okane I, Olariaga I, Otte J, Papp T, Park D, Petkovits T, Pino-Bodas R, Quaedvlieg W, Raja HA, Redecker D, Rintoul T, Ruibal C, Sarmiento-Ramirez JM, Schmitt I, Schussler A, Shearer C, Sotome K, Stefani FO, Stenroos S, Stielow B, Stockinger H, Suetrong S, Suh SO, Sung GH, Suzuki M, Tanaka K, Tedersoo L, Telleria MT, Tretter E, Untereiner WA, Urbina H, Vagvolgyi C, Vialle A, Vu TD, Walther G, Wang QM, Wang Y, Weir BS, Weiss M, White MM, Xu J, Yahr R, Yang ZL, Yurkov A, Zamora JC, Zhang N, Zhuang WY, Schindel D, Fungal Barcoding C. (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences of the United States of America 109: 6241–6246. 10.1073/pnas.1117018109 PubMed DOI PMC
Sinha R, Abu-Ali G, Vogtmann E, Fodor AA, Ren B, Amir A, Schwager E, Crabtree J, Ma S, Abnet CC. (2017) Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nature Biotechnology volume 35, pages 1077–1086. 10.1038/nbt.3981 PubMed DOI PMC
Wang Q, Garrity GM, Tiedje JM, Cole JR. (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology 73. 10.1128/aem.00062-07 PubMed DOI PMC
Vetrovský T, Baldrian P, Morais D, Berger B. (2018) SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses. Bioinformatics 1: 3. 10.1093/bioinformatics/bty071 PubMed DOI PMC
A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses