PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes

. 2020 Mar 01 ; 9 (3) : .

Jazyk angličtina Země Spojené státy americké Médium print

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid32161947

BACKGROUND: Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity, the computation capacity of high-performance computing systems is frequently required for such analyses. To address the difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise programming languages specialized for big data pipelines incorporate features like roll-back checkpoints and on-demand partial pipeline execution. FINDINGS: PEMA is a containerized assembly of key metabarcoding analysis tools that requires low effort in setting up, running, and customizing to researchers' needs. Based on third-party tools, PEMA performs read pre-processing, (molecular) operational taxonomic unit clustering, amplicon sequence variant inference, and taxonomy assignment for 16S and 18S ribosomal RNA, as well as ITS and COI marker gene data. Owing to its simplified parameterization and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved results of comparable quality. CONCLUSIONS: A high-performance computing-based approach was used to develop PEMA; however, it can be used in personal computers as well. PEMA's time-efficient performance and good results will allow it to be used for accurate environmental DNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.

Erratum v

PubMed

Zobrazit více v PubMed

Pavan-Kumar A, Gireesh-Babu P, Lakra WS. DNA metabarcoding: a new approach for rapid biodiversity assessment. J Cell Sci Mol Biol. 2015;2(1):111.

Thomsen PF, Willerslev E. Environmental dna–an emerging tool in conservation for monitoring past and present biodiversity. Biol Conserv. 2015;183:4–18.

Ji Y, Ashton L, Pedley SM, et al. .. Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. Ecol Lett. 2013;16(10):1245–57. PubMed

Schloss PD, Westcott SL, Ryabin T, et al. .. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41. PubMed PMC

Bolyen E, Rideout JR, Dillon MR, et al. .. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science. PeerJ Preprints. 2018;6:e27295v2. PubMed PMC

Hildebrand F, Tadeo R, Voigt AY, et al. .. LotuS: an efficient and user-friendly OTU processing pipeline. Microbiome. 2014;2:30. PubMed PMC

Normandeau E. Environmental DNA metabarcoding analysis. https://github.com/enormandeau/barque. Accessed 10 November 2019.

Axtner J, Crampton-Platt A, Hoerig LA, et al. .. An efficient and robust laboratory workflow and tetrapod database for larger scale environmental DNA studies. Gigascience. 2019;8(4):giz029. PubMed PMC

Gweon HS, Oliver A, Taylor J, et al. .. PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the I llumina sequencing platform. Methods Ecol Evol. 2015;6(8):973–80. PubMed PMC

European Strategy Forum on Research Infrastructures Innovation Working Group. Innovation-oriented cooperation of Research Infrastructures. Vol. 3 ESFRI Scripta 2018. ISBN Print: 978-88-943243-0-3.

Cingolani P, Sladek R, Blanchette M. BigDataScript: a scripting language for data pipelines. Bioinformatics. 2014;31:10–16. PubMed PMC

Rad BB, Bhatti HJ, Ahmadi M. An introduction to Docker and analysis of its performance. Int J Comput Sci Netw Secur. 2017;17:228.

Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017;12:e0177459. PubMed PMC

Coissac E, Riaz T, Puillandre N. Bioinformatic challenges for DNA metabarcoding of plants and animals. Mol Ecol. 2012;21(8):1834–47. PubMed

Callahan BJ, McMurdie PJ, Holmes SP. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 2017;11(12):2639. PubMed PMC

Pauvert C, Buée M, Laval V, et al. .. Bioinformatics matters: the accuracy of plant and soil fungal community data is highly dependent on the metabarcoding pipeline. Fungal Ecol. 2019;41:23–33.

Rognes T, Flouri T, Nichols B, et al. .. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4:e2584. PubMed PMC

Hao X, Jiang R, Chen T. Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics. 2011;27:611–8. PubMed PMC

Mahé F, Rognes T, Quince C, et al. .. Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ. 2015;3:e1420. PubMed PMC

Lanzén A, Jørgensen SL, Huson DH, et al. .. CREST–Classification Resources for Environmental Sequence Tags. PLoS One. 2012;7:e49334. PubMed PMC

Quast C, Pruesse E, Yilmaz P, et al. .. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–6. PubMed PMC

Nilsson RH, Larsson KH, Taylor AF, et al. .. The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Res. 2018;47(D1):D259–64. PubMed PMC

Kozlov AM, Darriba D, Flouri T, et al. .. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019;35(21):4453–5. PubMed PMC

Barbera P, Kozlov AM, Czech L, et al. .. EPA-ng: massively parallel evolutionary placement of genetic sequences. Syst Biol. 2018;68:365–9. PubMed PMC

Wang Q, Garrity GM, Tiedje JM, et al. .. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261–7. PubMed PMC

Machida RJ, Leray M, Ho SL, et al. .. Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples. Sci Data. 2017;4:170027. PubMed PMC

McMurdie JP, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013;8:e61217. PubMed PMC

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes. https://github.com/hariszaf/pema. Accessed on , November 2019. PubMed PMC

Andrews S. FastQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed 8 July 2019.

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30:2114–20. PubMed PMC

Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17(1):10–2.

Nikolenko SI, Korobeynikov AI, Alekseyev MA. Bayeshammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics. 2013;14:S7. PubMed PMC

Bankevich A, Nurk S, Antipov D, et al. .. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77. PubMed PMC

Masella AP, Bartram AK, Truszkowski JM, et al. .. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics. 2012;13:31. PubMed PMC

Boyer F, Mercier C, Bonin A, et al. .. OBITools: a UNIX-inspired software package for DNA metabarcoding. Mol Ecol Resour. 2016;16:176–82. PubMed

Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–1. PubMed

Benson DA, Cavanaugh M, Clark K, et al. .. GenBank. Nucleic Acids Res. 2018;46:D41–47. PubMed PMC

Czech L, Barbera P, Stamatakis A. Methods for automatic reference trees and multilevel phylogenetic placement. Bioinformatics. 2018;35:1151–8. PubMed PMC

Berger SA, Stamatakis A. PaPaRa 2.0: a vectorized algorithm for probabilistic phylogeny-aware alignment extension. Heidelberg Institute for Theoretical Studies. 2012. https://cme.h-its.org/exelixis/web/software/papara/index.html

Letunic I, Bork P. Interactive Tree of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2006;23:127–8. PubMed

Katoh K, Misawa K, Kuma KI, et al. .. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66. PubMed PMC

PEMA: flexible Pipeline for eDNA Metabarcoding Analysis of the 16S/18S rRNA, ITS & COI marker genes. https://hub.docker.com/r/hariszaf/pema. Accessed on, November 2019.

https://singularity-hub.org/collections/2295. Accessed on, November 2019.

Chavez J. Singularity: a “Docker” for HPC environments. https://dev.to/grokcode/singularity--a-docker-for-hpc-environments-i6p. Accessed on, 8 July 2019.

Gohl DM, Vangay P, Garbe J, et al. .. Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat Biotechnol. 2016;34(9):942. PubMed

Bradley IM, Pinto AJ, Guest JS. Design and evaluation of Illumina MiSeq-compatible, 18S rRNA gene-specific primers for improved characterization of mixed phototrophic communities. Appl Environ Microbiol. 2016;82(19):5878–91. PubMed PMC

Bakker MG. A fungal mock community control for amplicon sequencing experiments. Mol Ecol Resour. 2018;18(3):541–56. PubMed

Bista I, Carvalho GR, Tang M, et al. .. Performance of amplicon and shotgun sequencing for accurate biomass estimation in invertebrate community samples. Mol Ecol Resour. 2018;18(5):1020–34. PubMed

Pavloudi C, Kristoffersen JB, Oulas A, et al. .. Sediment microbial taxonomic and functional diversity in a natural salinity gradient challenge Remane's “species minimum” concept. PeerJ. 2017;5:e3687. PubMed PMC

Bista I, Carvalho GR, Walsh K, et al. .. Annual time-series analysis of aqueous eDNA reveals ecologically relevant dynamics of lake ecosystem biodiversity. Nat Commun. 2017;8:14087. PubMed PMC

Harrison PW, Alako B, Amid C, et al. .. The European Nucleotide Archive in 2018. Nucleic Acids Res. 2018;47:D84–8. PubMed PMC

Ting KM. Precision and recall. In: Sammut C, Webb GI, eds. Encyclopedia of Machine Learning. Boston, MA: Springer, 2011.

Camacho C, Coulouris G, Avagyan V, et al. .. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. PubMed PMC

Ratnasingham S, Hebert PD. BOLD: the barcode of life data system (http://www. barcodinglife. org). Mol Ecol Notes. 2007;7(3):355–64. PubMed PMC

Mahé F, Rognes T, Quince C, et al. .. Swarm: robust and fast clustering method for amplicon-based studies. PeerJ. 2014;2:e593. PubMed PMC

Fierer N, Brewer T, Choudoir M. Lumping versus splitting – is it time for microbial ecologists to abandon OTUs?. 2017. http://fiererlab.org/2017/05/02/lumping-versus-splitting-is-it-time-for-microbial-ecologists-to-abandon-otus/. Accessed on, 20 December 2019.

Glassman SI, Martiny JB. Broadscale ecological patterns are robust to use of exact sequence variants versus operational taxonomic units. MSphere. 2018;3(4):e00148–18. PubMed PMC

ELIXIR-GR. https://www.elixir-greece.org/. Accessed on, 8 July 2019.

LifeWatch-ERIC. https://www.lifewatch.eu/. Accessed on, 8 July 2019.

EMBRC. http://www.embrc.eu/. Accessed on, 8 July 2019.

Zafeiropoulos H, Quoc VH, Vasileiadou K, et al. .. Supporting data for “PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS, and COI marker genes.”. GigaScience Database. 2020. 10.5524/100715. Accessed on, November 2019. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...